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Abstract 

We consider a task graph mapped on a set of homogeneous processors. We aim at minimizing the energy con- 
sumption while enforcing two constraints: a prescribed bound on the execution time (or makespan), and a reliability 
threshold. Dynamic voltage and frequency scaling (DVFS) is an approach frequently used to reduce the energy con- 
sumption of a schedule, but slowing down the execution of a task to save energy is decreasing the reliability of the 
execution. In this work, to improve the reliability of a schedule while reducing the energy consumption, we allow for 
the re-execution of some tasks. We assess the complexity of the tri-criteria scheduling problem (makespan, reliability, 
energy) of deciding which task to re-execute, and at which speed each execution of a task should be done, with two 
different speed models: either processors can have arbitrary speeds (CONTINUOUS model), or a processor can run at 
a finite number of different speeds and change its speed during a computation (Vdd-Hopping model). We propose 
several novel tri-criteria scheduling heuristics under the continuous speed model, and we evaluate them through a set 
of simulations. The two best heuristics turn out to be very efficient and complementary. 
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1 Introduction 



Energy-aware scheduling has proven an important issue in the past decade, both for economical and environmental 
reasons. This holds true for traditional computer systems, not even to speak of battery-powered systems. More 
precisely, a processor running at speed s dissipates watts per unit of time |l4l|6][8l, hence it consumes s"^ x d joules 
when operated during d units of time. To help reduce energy dissipation, processors can run at different speeds. A 
widely used technique to reduce energy consumption is dynamic voltage and frequency scaling (DVFS), also known as 
speed scaling |!4][6][8j|. Indeed, by lowering supply voltage, hence processor clock frequency, it is possible to achieve 
important reductions in power consumption; faster speeds allow for a faster execution, but they also lead to a much 
higher (supra-linear) power consumption. There are two popular models for processor speeds. In the CONTINUOUS 
model, processors can have arbitrary speeds, and can vary them continuously in the interval [/mim /max]- This model 
is unrealistic (any possible value of the speed, say Ve", cannot be obtained), but it is theoretically appealing |6 |. In the 
Vdd-Hopping model, a processor can run at a finite number of different speeds (/i, fm)- It can also change its 
speed during a computation (hopping between different voltages, and hence speeds). Any rational speed can therefore 
be simulated ifTSl . The energy consumed during the execution of one task is the sum, on each time interval with 
constant speed /, of the energy consumed during this interval at speed /. 

Energy-aware scheduling aims at minimizing the energy consumed during the execution of the target application. 
Obviously, this goal makes sense only when coupled with some performance bound to achieve, otherwise, the optimal 
solution always is to run each processor at the slowest possible speed. In this paper, we consider a directed acyclic 
graph (DAG) of n tasks with precedence constraints, and the goal is to schedule such an application onto a fully 
homogeneous platform consisting of p identical processors. This problem has been widely studied with the objective of 
minimizing the total execution time, or makespan, and it is well known to be NP-complete |7|. Since the introduction 
of DVFS, many papers have dealt with the optimization of energy consumption while enforcing a deadline, i.e., a 
bound on the makespan E] |6l [S] 111 . 

There are many situations in which the mapping of the task graph is given, say by an ordered list of tasks to execute 
on each processor, and we do not have the freedom to change the assignment of a given task. Such a problem occurs 
when optimizing for legacy applications, or accounting for affinities between tasks and resources, or even when tasks 
are pre-allocated |19|, for example for security reasons. While it is not possible to change the allocation of a task, 
it is possible to change its speed. This technique, which consists in exploiting the slack due to workload variations, 
is called slack reclaiming lfT3l [TSl . In our previous work O, assuming that the mapping and a deadline are given, 
we have assessed the impact of several speed variation models on the complexity of the problem of minimizing the 
energy consumption. Rather than using a local approach such as backfilling [22.,18J . which only reclaims gaps in the 
schedule, we have considered the problem as a whole. 

While energy consumption can be reduced by using speed scaling techniques, it was shown in |'25''T0'| that reducing 
the speed of a processor increases the number of transient fault rates of the system; the probability of failures increases 
exponentially, and this probability cannot be neglected in large-scale computing lfT6ll . In order to make up for the loss 
in reliability due to the energy efficiency, different models have been proposed for fault- tolerance: (i) re-execution is 
the model under study in this work, and it consists in re-executing a task that does not meet the reliability constraint; 
it was also studied in ll25l l24l [TTl ; (ii) replication was studied in |[T][T2l; this model consists in executing the same 
task on several processors simultaneously, in order to meet the reliability constraints; and (iii) checkpointing consists 
in "saving" the work done at some certain points of the work, hence reducing the amount of work lost when a failure 
occurs lfT4ll23l . 

This work focuses on the re-execution model, for several reasons. On the one hand, replication is too costly in 
terms of both resource usage and energy consumption: even if the first execution turns out successful (no failure 
occurred), the other executions will still have to take place. Moreover, the decision of which tasks should be replicated 
cannot be taken when the mapping is already fixed. On the other hand, checkpointing is hard to manage with parallel 
processors, and too costly if there are not too many failures. Altogether, it is the "online/no-waste" characteristic 
of the corresponding algorithms that lead us focus on re-execution. The goal is then to ensure that each task is 
reliable enough, i.e., either its execution speed is above a threshold, ensuring a given reliability of the task, or the 
task is executed twice to enhance its reliability. There is a clear trade-off between energy consumption and reliability, 
since decreasing the execution speed of a task, and hence the corresponding energy consumption, is deteriorating 
the reliability. This calls for tackling the problem of considering the three criteria (makespan, reliabiUty, energy) 
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simultaneously. This tri-criteria optimization brings dramatic complications: in addition to choosing the speed of each 
task, as in the deadline/energy bi-criteria problem, we also need to decide which subset of tasks should be re-executed 
(and then choose both execution speeds). Few authors have tackled this problem; we detail below the closest works to 
ours IIT7ll24l[T1. 

Izosinov et al. fTTl study a tri-criteria optimization problem with a given mapping on heterogeneous architectures. 
However, they do not have any formal energy model, and they assume that the user specifies the maximum number of 
failures per processor tolerated to satisfy the reliability constraint, while we consider any number of failures but ensure 
a reliability threshold for each task. Zhu and Aydin [24| are also addressing a tri-criteria optimization problem similar 
to ours, and choose some tasks that have to be re-executed to match the reliability constraint. However, they restrict to 
the scheduling problem on one single processor, and they consider only the energy consumption of the first execution 
of a task (best-case scenario) when re-execution is done. Finally, Assayad et al. 1 1 1 have recently proposed an off-line 
tri-criteria scheduling heuristic (TSH), which uses active replication to minimize the makespan, with a threshold on the 
global failure rate and the maximum power consumption. TSH is an improved critical-path list scheduling heuristic 
that takes into account power and reliability before deciding which task to assign and to duplicate onto the next free 
processors. The complexity of this heuristic is unfortunately exponential in the number of processors. Future work 
will be devoted to compare our heuristics to TSH, and hence to compare re-execution with replication. 

Given an application with dependence constraints and a mapping of this application on a homogeneous platform, 
we present in this paper theoretical results and tri-criteria heuristics that use re-execution in order to minimize the 
energy consumption under the constraints of both a reliability threshold per task and a deadline bound. The first 
contribution is a formal model for this tri-criteria scheduling problem (Section [2]). The second contribution is to 
provide theoretical results for the different speed models, CONTINUOUS (Sectionjsf and Vdd-Hopping (Section[4|l. 
The third contribution is the design of novel tri-criteria scheduling heuristics that use re-execution to increase the 
reliability of a system under the CONTINUOUS model (SectionjSj), and their evaluation through extensive simulations 
(Section|6]l. To the best of our knowledge, this work is the first attempt to propose practical solutions to this tri-criteria 
problem. Finally, we give concluding remarks and directions for future work in Section |7] 

2 The tri-criteria problem 

Consider an application task graph Q = (V, £), where V = {Ti, T2, . . . , T„} is the set of tasks, n = \V\, and where 
£ is the set of precedence edges between tasks. For 1 < i < n, task Ti has a weight Wi, that corresponds to the 
computation requirement of the task. We also consider particular class of task graphs, such as linear chains where 
£ = U"Ji^{Ti ^ T,;+i}, forks with n + 1 tasks {Tq, Ti, T2, . . . , T„} and f = Uf^jTo ^ TJ. 

We assume that tasks are mapped onto a parallel platform made up of p identical processors. Each processor has a 
set of available speeds that is either continuous (in the interval [f-adm /max]) or discrete (with m modes {/i , • • • , /,„}), 
depending on the speed model (CONTINUOUS or Vdd-Hopping). The goal is to minimize the energy consumed 
during the execution of the graph while enforcing a deadline bound and matching a reliability threshold. To match the 
reliability threshold, some tasks are executed once at a speed high enough to satisfy the constraint, while some other 
tasks need to be re-executed. We detail below the conditions that are enforced on the corresponding execution speeds. 
The problem is therefore to decide which task to re-execute, and at which speed to run each execution of a task. 

In this section, for the sake of clarity, we assume that a task is executed at the same (unique) speed throughout 
execution, or at two different speeds in the case of re-execution. In Section |3] we show that this strategy is indeed 
optimal for the CONTINUOUS model; in Section]?] we show that only two different speeds are needed for the Vdd- 
Hopping model (and we update the corresponding formulas accordingly). We now detail the three objective criteria 
(makespan, reliability, energy), and then define formally the problem. 

2.1 Makespan 

The makespan of a schedule is its total execution time. The first task is scheduled at time 0, so that the makespan 
of a schedule is simply the maximum time at which one of the processors finishes its computations. We consider a 
deadline bound D, which is a constraint on the makespan. 
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Let £xe{wi, /) be the execution time of a task Tj of weight Wi at speed /. We assume that the cache size is adapted 
to the appHcation, therefore ensuring that the execution time is Hnearly related to the frequency fT4^: £xe{wi, /) = 
y-. When a task is scheduled to be re-executed at two different speeds /^^^ and /(^', we always account for both 

executions, even when the first execution is successful, and hence £xe{wi, f^-^^ Z*-^^) — joj + ypy- In other words, 
we consider a worst-case execution scenario, and the deadline D must be matched even in the case where all tasks that 
are re-executed fail during their first execution. 

2.2 Reliability 

To define the reliability, we use the fault model of Zhu et al. Il25ll24l . Transient failures are faults caused by software 
errors for example. They invalidate only the execution of the current task and the processor subject to that failure will 
be able to recover and execute the subsequent task assigned to it (if any). In addition, we use the reliability model 
introduced by Shatz and Wang 11211 . which states that the radiation-induced transient faults follow a Poisson distribu- 
tion. The parameter A of the Poisson distribution is then: 

A(/) = Ao e''~/.--/iin , (1) 

where /min < / < /max is the processing speed, the exponent d > is a constant, indicating the sensitivity of fault 
rates to DVFS, and Aq is the average fault rate corresponding to /max- We see that reducing the speed for energy saving 
increases the fault rate exponentially. The reliability of a task executed once at speed / is Ri (/) = e-Hf)>^£xei'^'i J) 
Because the fault rate is usually very small, of the order of 10^^ per time unit in |l5][l7l, 10^^ in [1], we can use the 
first order approximation of Ri (/) as 

i?,(/) = 1 - A(/) X £xe{w,, /) = 1 - Ao e^/.--/ii„ x ^ = 1 - Aq e""^^ x ^, (2) 

where d = j — and Aq = Aoe''-^"'»'=. This equation holds if = A(/) x < 1. With, say, A(/) 10"^, we 
need y- < lO'^ to get an accurate approximation with Ei < 0.01: the task should execute within 16 minutes. In other 
words, large (computationally demanding) tasks require reasonably high processing speeds with this model (which 
makes full sense in practice). 

We want the reliability Ri of each task Ti to be greater than a given threshold, namely Ri{f rei), hence enforcing 
a local constraint dependent on the task Ri > i?i(/rei)- If task Ti is executed only once at speed /, then the reliability 
of Ti is Ri — Ri{f)- Since the reliability increases with speed, we must have / > /^ei to match the reliability 
constraint. If task Ti is re-executed (speeds /^^^ and /'^^), then the execution of Ti is successful if and only if both 
attempts do not fail, so that the rehability of is i?, = 1 - (1 - i?i(/(i)))(l - i?j(/(2))), and this quantity should be 
at least equal to Ri{frei)- 

2.3 Energy 

The total energy consumption corresponds to the sum of the energy consumption of each task. Let Ei be the energy 
consumed by task Ti. For one execution of task T,; at speed /, the corresponding energy consumption is Ei{f) = 
£xe{wi, f) y- = Wi X p , which corresponds to the dynamic part of the classical energy models of the literature |4] 
121 [8] [3). Note that we do not take static energy into account, because all processors are up and alive during the whole 
execution. 

If task Ti is executed only once at speed /, then Ei = Ei{f). Otherwise, if task Ti is re-executed at speeds /'^' 
and f^'^\ it is natural to add up the energy consumed during both executions, just as we add up both execution times 
when enforcing the makespan deadline. Again, this corresponds to the worst-case execution scenario. We obtain 
Ei = Ei{fl^^) + Ei{fl'^^). Note that some authors f24l consider only the energy spent for the first execution, which 
seems unfair: re-execution comes at a price both in the deadline and in the energy consumption. Finally, the total 
energy consumed by the schedule, which we aim at minimizing, is = -^i- 
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2.4 Optimization problems 



The two main optimization problems are derived from the two different speed models: 

• Tri-Crit-Cont. Given an application graph Q = {V,£), mapped onto p homogeneous processors with 
continuous speeds, Tri-Crit-Cont is the problem of deciding which tasks should be re-executed and at which 
speed each execution of a task should be processed, in order to minimize the total energy consumption E, subject 
to the deadline bound D and to the local reliability constraints Ri > Ri{f rei) for each T; e V. 

• Tri-Crit-Vdd. This is the same problem as Tri-Crit-Cont, but with the Vdd-Hopping model. 

We also introduce variants of the problems for particular application graphs: Tri-Crit-Cont-Chain is the same 
problem as Tri-Crit-Cont when the task graph is a linear chain, mapped on a single processor; and Tri-Crit- 
CONT-FORK is the same problem as Tri-Crit-Cont when the task graph is a fork, and each task is mapped on a 
distinct processor. We have similar definitions for the Vdd-Hopping model. 



a unique speed throughout its execution: 

Lemma 1. With the Continuous model, it is optimal to execute each task at a unique speed throughout its execution. 

The idea is to consider a task whose speed changes during the execution; we exhibit a speed such that the execution 
time of the task remains the same, but where both energy and reliability are potentially improved, by convexity of the 
functions. 

Proof. We can assume without loss of generality that the function that gives the speed of the execution of a task is a 
piecewise-constant function. The proof of the general case is a direct corollary from the theorem that states that any 
piecewise-continuous function defined on an interval [a, h] can be uniformly approximated as closely as desired by a 
piecewise-constant function f20l. 

Suppose that in the optimal solution, there is a task whose speed changes during the execution. Consider the first 
time-step at which the change occurs: the computation begins at speed / from time t to time t' , and then continues at 
speed /' until time t" . The total energy consumption for this task in the time interval [t, t"] is E ~ [t' ~t)y. p + {t" — 
t') X (/')^. Moreover, the amount of work done for this task \sW — {f — t) x / + [t" — t') x /'. The reliability of 

the task is exactly 1 — Aq ^(i' — t) x e~'^^ + {t" — t') x e^'^^ + , where r is a constant due to the reliability of the 

rest of the process, which is independent from what happens during [t, t"]. The reliability is a function that increases 
when the function h{t, t' , t" , f, /') = {f - t) x e"'^^ + it" ~ t') x e-'^f decreases. 

If we run the task during the whole interval [t, t"] at constant speed = W/ {t" — t), the same amount of work 
is done within the same time, and the energy consumption during this interval of time becomes E' = [t" — t) x /J. 
Note that the new speed can be expressed as /d = a/ + (1 — a)/', where < a = jrrzrt < 1. Therefore, because 
of the convexity of the function x i-^ x^, we have E' < E. Similarly, since x n- e^''^ is a convex function, 
h{t, t' , t" , fd, f^) < h{t, t' , t" , f, /'), and the reliability constraint is also matched. This contradicts the hypothesis of 
optimality of the first solution, and concludes the proof. □ 

Next we show that not only a task is executed at a single speed, but that its re-execution (whenever it occurs) is 
executed at the same speed as its first execution: 

Lemma 2. With the Continuous model, it is optimal to re-execute each task (whenever needed) at the same speed 
as its first execution, and this speed f is such that f^^"^"^ ^ f < -y^frei, where 



3 Continuous model 



As stated in Section|2] 



we start by proving that with the CONTINUOUS model, it is always optimal to execute a task at 





(3) 



t f(.inf)-.2 



frel 
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Similarly to the proof of Lemma [T| we exhibit a unique speed for both executions, in case they differ, so that the 
execution time remains identical but both energy and reliability are improved. If this unique speed is greater than 
^/rei, then it is better to execute the task only once at speed /rei. and if / is lower than f^ ^"^\ then the reliability 
constraint is not matched. 

Proof. Consider a task Ti executed a first time at speed and a second time at speed /[ > fi. Assume first that 
d = Q, i.e., the reliability of task T.^ executed at speed fi is Ri{fi) = 1 — Xqj^. We show that executing task Ti 

twice at speed / = \J fif[ improves the energy consumption while matching the deadline and reliability constraints. 
Clearly the reliability constraint is matched, since 1 — Apwf -p = 1 — Aqw| jfj/. The fact that the deadline constraint 

is matched is due to the fact that J fi f[ > V]^ft (by squaring both sides of the equation we obtain {fi — flY > 0). 



/.+/.' 

Then we use the fact that fd — is the minimal speed such that V/ > /d, < + yf . Finally, it is easy to 

see that the energy consumption is improved since 2/^// < ff + f'^, hence 2wififl — ^iff + ^i/i^- 

In the general case when d ^ 0, instead of having a closed form formula for the new speed / common to both 
executions, we have / = max(/i, /2), where /i is dictated by the reliability constraint, while /2 is dictated by the 
deadline constraint, /i is the solution to the equation 2((iX + In X) = (rf/i + In + + In//); this equation comes 

from the reliability constraint: the minimum speed X to match the reliability is obtained with 1 — AqW? —f— ~ 

1 — Aqw| ^2 ■ The deadline constraint must also be enforced, and hence /2 — jh~fT (minimum speed to match the 
deadline). Then the fact that the energy does not increase comes from the convexity of this function. 

Let / be the unique speed at which the task is executed (twice). If / > ^/rei, then executing the task only once 
at speed /^ei has a lower energy consumption and execution time, while still matching the reliability constraint. Hence 
it is not optimal to re-execute the task unless / < ^/rei- Finally, note that / must be greater than f\^^^\ solution of 

Equation ob, since is the minimum speed such that the reliability constraint is met if task Ti is executed twice 

at the same speed. □ 

Note that both lemmas can be applied to any solution of the Tri-Crit-Cont problem, not just optimal solutions, 
hence all heuristics of Section[5]will assign a unique speed to each task, be it re-executed or not. 
We are now ready to assess the problem complexity: 

Theorem 1. The Tri-Crit-Cont-Chain problem is NP-hard, but not known to be in NP. 

Note that the problem is not known to be in NP because speeds could take any real values (CONTINUOUS model). 
The completeness comes from SUBSET-SUM |11|. The problem is NP-hard even for a linear chain application 
mapped on a single processor (and any general DAG mapped on a single processor becomes a linear chain). 

Proof. Consider the associated decision problem: given a deadline, and energy and reliability bounds, can we schedule 
the graph to match all these bounds? Since the speeds could take any real values, the problem is not known to be in 
NP For the completeness, we use a reduction from SUBSET-SUM fU]. Let Ii be an instance of SUBSET-SUM: 
given n strictly positive integers oi, . . . , a„, and a positive integer X, does there exist a subset / of {1, . . . , n} such 
that E^e/ a^ = X7 Let S = Zti 

We build the following instance I2 of our problem. The execution graph is a linear chain with n tasks, where: 

* task Ti has weight Wi — ai, 

. \ ^ /max . 

^0 100 max; ai ' 

* fmin — \/Ao mRXj fl^/niax YQ/max? 

* frel fmaxy ^ 0- 



The bounds on reliability, deadline and energy are: 

• R° = R^ifrel) = 1 - Ao for 1 < i < n; 

• Dq — 7^ + where c is the unique positive real root of the polynomial 7y'^ + 21y^ — 3y — 1. Analytically, 

we derive that c = 4:^J~^ cos | (tt — tan^^ ^) — 1 (~ 0.2838); but this value is irrational, so have to we encode 
it symbolically rather than numerically; 
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• Eo = 2X(-^Aei)2 + {S- X)fl,. 
1 + c 

Clearly, the size of I2 is polynomial in the size of Ii. 

2c 

Suppose first that instance X\ has a solution, I. For all i ^ I,Ti is executed twice at speed Jrei- Otherwise, 

1 + c 

for all i ^ /, it is executed only once at speed /rei- The execution time is X^i^/ "T^ — X^ig/ 2 -i^K — — + 

2X 2e/'^^ ^0- The reliability constraint is obviously met for tasks not in /. It is also met for all tasks in /, since 

2c 2 
/rel > /min, and two cxccutions at /min are sufficient to match the reliability constraint. Indeed, 1 — An = 

1+c J min 

1 — Ao ^ > 1 — Aq -j^ — R^. The energy consumption is exactly E^. All bounds are respected, and therefore 

we have a solution to l2- 

Suppose now that I2 has a solution. Let / {i | is executed twice in the solution}, and Y = Xie/ '^i- We 
prove in the following that necessarily Y ~ X, since the energy constraint i?o is respected in 12- 

We first point out that tasks executed only once are necessarily executed at maximum speed to match the reliability 
constraint. Then consider the problem of minimizing the energy of a set of tasks, some executed twice, some executed 
once at maximum speed, and assume that we have a deadline Dq to match, but no constraint on reliability or on /min- 
We will verify later that these additional two constraints are indeed satisfied by the optimal solution when the only 
constraint is the deadline. Thanks to Lemma |2] for all i E I, task T, is executed twice at the same speed. It is easy 
to see that in fact all tasks in / are executed at the same speed, otherwise we could decrease the energy consumption 
without modifying the execution time, by convexity of the function. Let / be the speed of execution (and re-execution) 
of task Ti, with i E I. Because the deadline is the only constraint, either Y — (no tasks are re-executed), or it is 
optimal to exactly match the deadline Dq (otherwise we could just slow down all the re-executed tasks and this would 
decrease the total energy). Hence the problem amounts to find the values of Y and / that minimize the function 
E = 2Yp + [S - with the constraint {S - Y)/f^^^ + 2r// < D^. First, note that if F = then 

E > Eq, and hence Y > (since it corresponds to a solution of 12). Therefore, since the deadline is tight, we have 
/ = Daf i-{S-Y) f^e^' and finally the energy consumption can be expressed as 



We aim at finding the minimum of this function. Let Y = -p—r- — ? ■ Then we have E(Y) = { /^^j,, + ( -p—r- — e ^ i^) ) x 
{Dofiei — S)f^^i. Differentiating, we obtain 



E'iY) = ( ^7^^ - 1 1 Po/.ei - S)f, 



(i + y)2 (i + y)3 



rel 



Finally, E'(Y) = if and only if 

24Y'^{1 + Y) -16Y^ - (l + Yf ^0. (4) 
The only positive solution of Equation Q is F = c, and therefore the unique minimum of E(Y) is obtained for 

Y = c{Dofrel -S)=X. 

Note that for Y = X, we have E = Eq, and therefore any other value of Y would not correspond to a solution. 
There remains to check that the solution matches both constraints on /min and on reliability, to confirm the hypothesis 
on the speed of tasks that are re-executed. Using the same argument as in the first part of the proof, we see that 
the reliability constraint is respected when a task is executed twice at /min, and therefore we just need to check that 

/ > /min- For y = X, we have / = l^/rel > /min- 

Altogether, we have Xie/ cii = Y = X, and therefore Ii has a solution. This concludes the proof. □ 
Even if Tri-Crit-Cont-Chain is NP-hard, we can characterize an optimal solution of the problem: 
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Proposition 1. Iffrei < /max, then in any optimal solution of Tri-Crit-CONT-Chain, either all tasks are executed 
only once, at constant speed niax( ^'^^^ , frei)! or at least one task is re-executed, and then all tasks that are not 
re-executed are executed at speed frei- 

Proof. Consider an optimal schedule. If all tasks are executed only once, the smallest energy consumption is obtained 
when using the constant speed — -. However if — - < /rei. then we have to execute all tasks at speed frei 
to match both reliability and deadUne constraints. 

Now, assume that some task Tj is re-executed, and assume by contradiction, that some other task Tj is executed 
only once at speed fj > /rei- Note that the common speed fi used in both executions of Tj is smaller than /rei. 
otherwise we would not need to re-execute Tj. We have fi < /rei < fj, and we prove that there exist values f- (new 
speed of one execution of Ti) and fj (new speed of Tj) such that /, < /■ , /rei < .fj < fj, and the energy consumed 
with the new speeds is strictly smaller, while the execution time is unchanged. The constraint on reliability will also be 
met, since the speed of one execution of Tj is increased, while the speed of Tj remains above the reliability threshold. 
Note that we do not modify the speed of the re-execution of T^ (that remains fj), and the time and energy consumption 
of this execution are not accounted for in the equations. Also, we restrict to values such that f- < fj. 

Our problem writes: do there exist e, e' > such that 

Wiff + w,ff > w,{f, + e'f+w,{fj-ef; 

^ Wi Wi Wi Wj 

D = ^ -\- = — ^ + 



fi fj fi + e' fj - e 
fi < fi + e' < fj-e; 
frei < fj-e < fj. 

We study the function (p : e i-^ Wiff -\- Wjfj — (wi{fi -\- e')^ -|- Wj{fj — e)^), and we want to prove that it is 
positive. Thanks to the deadline constraint {D is tiie bound on the execution time of Tj plus one execution of Tj), we 

have fi = n^^, and fi + e' = ^i, = j^Tl^^'T^ . 



We can therefore express 0(e) as: 

^'^-{Df.-w.r (D(/,-e)-«;,)2+^^Z^ ^^^^^ 



Moreover, we study the function for e > 0, and because of the constraint on new speeds, e < fj — frei- Another 
bound on e is obtained from the fact that fi + e' < fj — e, and the equality is obtained when both tasks are running at 
speed j"'^ , thus meeting the deadline. Hence, fj — e> ^"'^ , and finally 



Differentiating, we obtain 



n ^ ^ f If Wi+Wj 

0<e< fj - max ( /rei, — 



2wKf,-e) 2DwKf,-ef 



^^(-)= (z.(y^-;)-^)^ - (z.(y,q-^)3 +^-^(^-)- 

We are looking for e such that (^'(e) = 0, hence obtaining the polynomial 

W^j 

by multiplying each side of the equation by '"^'"^i'^^^Z^^^ , defining X = . The only real solution to 

this polynomial is X = that corresponds to e = fj — . Therefore, the only extremum of the fimction (j) is 
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obtained for this value of e, which corresponds to executing both tasks at the same speed. Because of the convexity 
of the energy consumption, this value corresponds to a maximum of function cj) (see for instance Proposition 2 in lO), 
since the energy is minimized when both tasks run at the same speed. Therefore, is strictly increasing for < e < 

fj — jj*"^ , and for f = fj — max ^/rei, ^ ^ ^ is maximal (with regards to our constraints), and 0(e) > 0. 

Altogether, this value of e gives us two new speeds fl — -e)"-t« ^'i ~ h ~ ^ ^^^^ strictly improve the 
energy consumption of the schedule, while the constraints on deadline and reliability are still enforced. However, the 
original schedule was supposed to be optimal, we have a contradiction, which concludes the proof. □ 

In essence. Proposition [T| states that when dealing with a linear chain, we should first slow down the execution 
of each task as much as possible. Then, if the deadline is not too tight, i.e., if /^ei > , there remains the 

possibility to re-execute some of the tasks (and of course it is NP-hard to decide which ones). Still, this general 
principle "first slow-down and then re-execute" will guide the design of type A heuristics in Section|5] 

While the general Tri-Crit-Cont problem is NP-hard even with a single processor, the particular variant Tri- 
Crit-Cont-Fork can be solved in polynomial time: 

Theorem 2. The Tri-Crit-Cont-Fork problem can be solved in polynomial time. 

The difficulty to provide an optimal algorithm for the Tri-Crit-Cont-Fork problem comes from the fact that 
the total execution time must be shared between the source of the fork. To, and the other tasks that all run in parallel. 
If we know D' , the fraction of the deadline allotted for tasks Ti, . . . ,Tn once the source has finished its execution, 
then we can decide which tasks are re-executed and all execution speeds. 

Proof. We start by showing that Tri-Crit-CONT can be solved in polynomial time for one single task, and then for 
n independent tasks, before tackling the problem Tri-Crit-Cont-Fork. 

Tri-Crit-Cont for a single task on one processor can be solved in polynomial time. When there is a single 
task T of weight w, the solution depends on the deadline D: 

1. if 13 < = D^°\ then there is no solution; 

2. if < D < ~ D'^^\ then T is executed once at speed ^, the minimum energy is, w'^ x 

3. if < D < ^j-^ — D^'^\ then T is executed once at speed /rei, the minimum energy is wf^^^; 

4. if < D < -j^^ = then T is executed twice at speed the minimum energy is (2ii;)^ x 

5. if < D, then T is executed twice at speed f'^^^^K the minimum energy is 2w/*^^"^)^. 

These results are a direct consequence from the deadline and reliability constraints. With a deadline smaller than D'^'^\ 
the task cannot be executed within the deadline, even at speed /max- The bound Z?'^) comes from Lemma [2] which 
states that we need to have enough time to execute the task twice at a speed lower than ^/rei before re-executing it. 
Therefore, the task is executed only once for smaller deadlines, either at speed w/D, or at speed /j-ei if w/D < frei- 
For larger deadlines, the task is re-executed, either at speed 2w/D, or at speed /(^"^) \f 2w/D < /*^^"^-', since the 
re-execution speed cannot be lower than /(^"^) (see Lemma|2|. 

Tri-Crit-Cont for n independent tasks on n processors can be solved in polynomial time. For n independent 
tasks mapped on n distinct processors, decisions for each task can be made independently, and we simply solve n times 
the previous single task problem. The minimum energy is the sum of the minimum energies obtained for each task. 
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Tri-Crit-Cont-Fork. For a fork, we need to decide how to share the deadline between the source Tq of the 
fork and the other tasks (i.e., n independent tasks on n processors). We search the optimal values Di and D2 such 
that Di + D2 = D, and the energy of executing Tq within deadline Di plus the energy of executing all other tasks 
within D2 is minimum. Therefore, we just need to find the optimal value for D2 (since Di = D — D2), and reuse 
previous results for independent tasks. 

Independently of D, we can define for each task Ti four values _d|"^ , D^^^ , D^'' and D^'' , as in the case of a single 
task. There is a solution if and only if maxi<i<„ uf''' < D2 < D — Dq''^ ■ Then, the energy consumption depends 

~ Tj) (j) 

upon the intervals delimited by values D — Dq and Df , for 1 < i < n and j ~ 1, 2, 3. Within an interval, the 
energy consumed by the source is either a constant, or a constant times (Tjirjy^jT^ and the energy consumed by task Ti 
(1 < i < n) is either a constant, or a constant times 7^^. All the constants are known, only dependent of Ti, and 
they are obtained by the algorithm that gives the optimal solution to Tri-Crit-Cont for a single task. To obtain the 
intervals, we sort the An values of Df (i > 0) and the four values of D — , with j = 0, 1, 2, 3, and rename these 
4(ri + 1) values as dk, with 1 < fc < 4(n + 1) and dk < rffc+i- Given the bounds on D2, we consider the intervals 
of the form [dk, dk+i], with d^ > maxi<i<„ Df^\ and d^+i < D — D^q^. On each of these intervals, the energy 
function is (o^o^yi + + K" , where K, K' and K" are positive constants that can be obtained in polynomial time 
by the solution to Tri-Crit-Cont for a single task. Finding a minimum to this function on the interval [dk, dk+i] 
can be done in polynomial time: 



• the first derivative of this function is rr^ — pt-t^^ Fj3- , 



2K _ 2K' . 

(D-D-2y^ Di 



• the function is convex on ]0,-D[, indeed the second derivative of this function is j^)4 + 75^, which is 
positive on ]0, and therefore on the interval [dk,dk+i], there is exactly one minimum to the energy function 

(dk > and dk+i < D); 

• the minimum is obtained either when the first derivative is equal to zero in the interval (i.e., if there is a solution 
to the equation 2KD2 ~ 2K'{D — I?2)^ = in [dk, dk+i]), or the minimum is reached at dk (resp. dk+i) if the 
first derivative is positive (resp. negative) on the interval. 

There are 0{n) intervals, and it takes constant time to find the minimum energy Ek within interval [dk, dfc+i], 
as explained above, by solving one equation. Since we have partitioned the interval of possible deadlines D2 £ 

maxi<i<„ £) — D^'' , and obtained the minimum energy consumption in each sub-interval, the minimum 
energy consumption for the fork graph is mint Ek, and the value of D2 is obtained where the minimum is reached. 
Once we know the optimal value of D2, it is easy to reconstruct the solution, following the algorithm for a single task, 
in polynomial time. □ 

Note that this algorithm does not provide any closed-form formula for the speeds of the tasks, and that there is an 
intricate case analysis due to the reliability constraints. 

If we further assume that the fork is made of identical tasks (i.e., = w for < z < n), then we can provide 
a closed-form formula. However, Proposition |2] illustrates the inherent difficulty of this simple problem, with several 
cases to consider depending on the values of the deadline, and also the bounds on speeds (/min, /max, frei, etc.). First, 
since the tasks all have the same weight Wi = w, we get rid of the introduced above, since they are all identical 

(see Equation ([ij): fi^^^^ ~ y(inf) for < z < n. Therefore we let /,„in = max(/min, Z'-^"^'') in the proposition 
below: 

Proposition 2. In the optimal solution of Tri-Crit-Cont-Fork with at least three identical tasks (and hence n > 2), 
there are only three possible scenarios: ( i) no task is re-executed; ( ii) the n successors are all re-executed but not the 
source; (Hi) all tasks are re-executed. In each scenario, the source is executed at speed fsrc (once or twice), and the 
n successors are executed at the same speed fieaf (once or twice). 



For a deadline D < -j^, there is no solution. For a deadline D G 
(scenario (i)) and the values of f^rc ond fieaf o.re the following: 



2w w (l+2n3)2 

/max ' frel /l+^T 



no task is re-executed 
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• if^^ < D < mill ( + nl),w{j^^ + j^)^, then f^rc = /max and fi^af = ji/„^,-^, /max; 

Jmax Jrel „3 n 3 

- if < D < ji^^, then f,rc = Sj^^/rei and fj^af = frei; 

• ifj^{l + ni) > w{j^ + j^), then 

- if'^ijt; + jL:^^ < D < then fsrc = ufZ-w frel and fi^af = frel! 

• ifj^^<D< then fsrc = fleaf = frel. 

Note that for larger values of D, depending on fmin, we can move to scenarios (ii) and (iii) with partial or total 
re-execution. The case analysis becomes even more painful, but remains feasible. Intuitively, the property that all 
tasks have the same weight is the key to obtaining analytical formulas, because all tasks have the same minimum 
speed /(^"^) dictated by Equation (jsj. 

Proof. First, we recall preliminary results: 

• if a task is executed only once at speed /, then /rei < / < /max; 

• if a task is re-executed, then both executions are done at the same speed /, and /,nin < / < ^/rei- 

By hypothesis, all tasks are identical: the bound on re-execution speed accounts for /(^"^) as in Lemma|2j since 
we now have /min = inax(/min, /^^"^^). Therefore, if two tasks of same weight w have the same energy consumption 
in the optimal solution, then they are executed the same number of times (once or twice) and at the same speed(s). If 
the energy is greater than or equal to wf"^^^, then necessarily there is one execution; and if it is lower than wf'^^i, then 
necessarily there are two executions. 

First, we prove that in any solution, the energy consumed for the execution of each successor task, also called 
leaf, is the same. If it was not the case, since each task has the same weight, and since each leaf is independent from 
the other and only dependent on the source of the fork, if a leaf Ti is consuming more than another leaf Tj, then we 
could execute Ti the same number of times and at the same speed than Tj, hence matching the deadline bound and the 
reliability constraint, and obtaining a better solution. Thanks to this result, we now assume that all leaves are executed 
at the same speed(s), denoted /leaf - The source task may be executed at a different speed, /src- 

Next, let us show that the energy consumption of the source is always greater than or equal to that of any leaf in 
any optimal solution. First, since the source and leaves have the same weight, if we invert the execution speeds of the 
source and of the leaves, then the reliability of each task is still matched, and so is the execution time. Moreover, the 
energy consumption is equal to the energy consumption of the source plus n times the energy consumption of any leaf 
(recall that they all consume the same amount of energy). Hence, if the energy consumption of the source is smaller 
than the one of the leaves, permuting those execution speeds would reduce by (n — 1) x A the energy, where A is 
the positive difference between the two energy consumptions. Thanks to this result, we can say that the source should 
never be executed twice if the leaves are executed only once since it would mean a lower energy consumption for the 
source (recall that n > 2). 

This result fully characterizes the shape of any optimal solution. There are only three possible scenarios: (i) 
no task is re-executed; (ii) the n successors (leaves) are all re-executed but not the source; (iii) all tasks are re- 
executed. We study independently the three scenarios, i.e., we aim at determining the values of /src and /leaf in each 
case. Conditions on the deadline indicate the shape of the solution, and we perform the case analysis for deadlines 

^ w (l+2nS)i 

Let us assume first that the optimal solution is such that each task is executed only once (scenario (i)). From the 
proof of Theorem 1 in [3|, we obtain the optimal speeds with no re-execution and without accounting for reliability; 
they are given by the following formulas: 

• if I? < then there is no solution, since the tasks executed at /max exceed the deadline; 

• if < -D < -^(1 + then /src = /max and /leaf = D/.„l-^, /max; 

. if ^(l + n3) < Athen/src- f (l + n3)and/ieaf- 

J max 3 
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Since there is a minimum speed frei to match the reliabihty constraint, there is a condition when /leaf < /rei 
that makes an amendment on some of the items. Note that in all cases, ifD> then both the source and the leaves 



are executed at speed frei, i-C-, /src = /leaf ~ /rei (recall that we consider the case with no re-execution). 

• If < D < + n^ ), then we need /leaf — Df ^ /max ^ /rei' hence the condition: D < 

• \f D > min + ^ij^ + T^^-')' '^^'^ previous results do not hold anymore because of the 
constraint on the speed of the leaves. We must further differentiate cases, depending on where the minimum is 
reached. 

• If 7^(1 + ^1^) < + ^),then 

J ' 1 

- if 7^(1 + ns) < D < -J^z_ 1+^^ are in the third case with no reliability, and therefore /src = 

/max Jrel „ 3 

^(1 + n3) and /leaf — jj ^+^1^ ; the upper bound on D guarantees that /leaf > /rei^ while the lower 

n 3 

bound on D guarantees that /src < /max; 
1 

_ j£ m l+n3 ^ ri ^ 2m 



/r. 



n3 



< D < j^, then the speed of the leaves is constrained by /rei, and we obtain /leaf 



/rel and /src = J)/ ^^_.„j /rei- From the lower bound on I?, we obtain /gj-c < '^^/rei, and since 
+ 713) < u;( + -^), we have /src < "-^/^g^ < f^^^^. 

• If + ns) > + 7^)' then for + j^) < D < j^, the leaves should be executed at 
speed /leaf = /rei> and for the source, /src = Df ™_„, /rei- Note that the lower bound on D is equivalent to 
I?/ "[_m /rei < /max, and heuce the speed of the source is not exceeding /max- 

As stated above, if Z? > j^, both the source and the leaves are executed at speed /j-ei (with no re-execution). 
However, if the deadline is larger, re-execution will be used by the optimal solution (i.e., it will become scenario (ii)). 
Let us consider therefore the scenario in which leaves are re-executed, to compare the energy consumption with the 
first scenario. In this case, we consider an equivalent fork in which leaves are of weight 2w, and a schedule with no 
re-execution. Then the optimal solution when there is no maximum speed is: 

J- jj- w 1 + 2ni 

/src = 7^(1 + 2n3) and /leaf = — j — • 

JJ u n3 

If /leaf > -^frei, then there is a better solution to the original problem without re-execution. Indeed, the solution 
in which the leaves (of weight w) are executed once at speed /(eaf — ^^^{fieaf, /rei) is such that: 

• the reliability constraint is matched (/(gaf ^ /rei); 

• the deadline constraint is matched (/{eaf ^ /leaf, and /leaf corresponds to the solution with re-execution, i.e., 
w/f,ra + 2w/f,,,f < D); 

• the energy consumption is better, as stated by Lemmal2|if /(eaf — /rei- 

Therefore, we are in scenario (ii) when /leaf < -7= frei, i-C-, D > -j^v^ "'"''"^"^ ■ 

V2 Jrel „3 

Moreover, depending whether /src > /rei or /src < /rei: 

• if /src > /rel, i-6-, D < + 2n3 ), then the solution is valid; 

• if /src < /rel, then we must in fact have /src = /rei, and then /leaf = max(;pj^^/rei, /min)- 

Note that these values do not take into account the constraints /max and /min- Therefore, they are lower bounds on the 
energy consumption when the leaves are re-executed. 

Finally, we establish a bound Dq on the deadline: for larger values than Dq, we cannot guarantee that re-execution 
will not be used by the optimal solution, and hence we will have fully characterized the cases for deadlines smaller 
than Dq. Since we have only computed lower bounds on energy consumption for the scenario (ii), this bound will not 
be tight. We know that the minimum energy consumption is a function decreasing with the deadline: if > D', then 
any solution for D' is a solution for D. Let us find the minimum deadline D such that the energy when the leaves are 
re-executed is smaller than the energy when no task is re-executed. 
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As we have seen before, necessarily if _D < "'"^^"'^ , then it is better to have no re-execution, i.e., Dq > 

™ ^i+2ni^ Let D = ^V2i±%i + e. We suppose also that D < ^{1 + 2ni), i.e., the solution with re- 
execution is valid (/src > /rei)- 



The energy consumption when the leaves are re-executed is greater than 

D2 ' 



• With no re-execution, the deadline is large enough so that each task can be executed at speed /rei, and therefore 
the energy consumption is 



E, = il + n)wf^,, = 2^j^il + n) 
We now check the condition Ei < £2'- 



l+2n 3 
i 
ri3 



1 \ 2 

1 + 2n3 \ 



(1 + n) — < — (l + 2n3) 



3 



2 1 + n ^ 1 + 2n3 



D"^ ni + 2n 



< 



{D-ef - 2 + 2n 

w (l + 2n5)i 

^ — 1 — = ^0 

Jrel V^ + n 

Furthermore, note that Dq < + 2ni) for n > 2, hence the hypothesis that f^rc > /rei is valid for the 

values considered. Finally, if the deadline is smaller than the threshold value Dq, then we can guarantee that the 
optimal solution will not do any re-execution. However, if the deadline is larger, we do not know what happens (but it 
can be computed as a function of /min, /max and /rei)- D 

Beyond the case analysis itself, the result of Proposition|2]is interesting: we observe that in all cases, the source task 
is executed faster than the other tasks. This shows that Proposition[T]does not hold for general DAGs, and suggests that 
some tasks may be more critical than others. A hierarchical approach, that categorizes tasks with different priorities, 
will guide the design of type B heuristics in Section|5] 



4 Vdd-Hopping model 

Contrarily to the CONTINUOUS model, the Vdd-Hopping model uses discrete speeds. A processor can choose 
among a set {/i, f„i} of possible speeds. A task can be executed at different speeds. 

Let a(i j) be the time of computation of task Ti at speed fj. The execution time of a task Ti is £xe{Ti) = 
^Jlj^ ct(ij), and the energy consumed during the execution is Ei — X^JLi '^(i.j)fj- Finally, for the reliability, the 
approximation used in Equation (j2]i still holds. However, the reliability of a task is now the product of the reliabilities 
for each time interval with constant speed, hence Ri — HjliCl ~ '^o ^^'''^^ Using a first order approximation, 
we obtain 

m m 

i?, = 1 - Ao ^ e-'^^' a(i,j) = 1 - Ao ^ j) > where hj = e''^^' , 1 < j < m. (5) 

We first show that only two different speeds are needed for the execution of a task. This result was already known 
for the bi-criteria problem makespan/energy, and it is interesting to see that reliability does not alter it: 

Proposition 3. With the Vdd-Hopping model, each task is computed using at most two different speeds. 

Proof. Suppose that a task is computed with three speeds, /i < /2 < /a, and let hj = e~'^^\ for j — 1, 2, 3. We 
show that we can get rid of one of those speeds. The proof will follow by induction. Let a, be the time spent by the 
processor at speed fi. We aim at replacing each Ui by some a[ so that we have a better solution. The constraints write: 
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1 . Deadline not exceeded: 

2. Same amount of work: 

3. Reliability preserved: 

4. Less energy spent: 



ai + a2 + a3 > ot^ + a2 + ctg. (6) 

"i/i + OLiji + as/s = a'l/i + 02/2 + as/s- (7) 

ai/ii + a2/i2 + "3^3 > + c!ih2 + 03/13. (8) 

ai/f + oc^ll + aa/l > a'J^ + 4/| + a^/|. (9) 



We show that a'j^ = ai — ei, cij = a2 + ei + €3, and ajj = 0:3 — 63 is a valid solution: 
I is satisfied, since ai + a2 + ^3 = o'^ + ctj + a^j. 
/a — fi 



Equation (6 
Equation (7 



gives ei = 63 



J2-/1, 

• Next we replace the and e.^ in Equation ([8jl and we obtain /i2(/3 ^ /i) < ^i(/3 ^ /2) + ^3(/2 ^ /i), which 
is always true by convexity of the exponential (since hj — e^'^^^). 

• Finally, Equation (jojl gives us ei/f + £3/3 > (£3 + £1)/!, which is necessarily true since fi < J2 < /s and 
f —i' is convex (barycenter). 

Since we want all the a'^ to be nonnegative, we take 

. f fh-hW . . f ff2-h 

£1 = mm «!, as — — and £3 = mm a^, ai 



h — fi J J ' V ' \h — f: 

We have either £1 — ai or £3 = a^, which means that a[ = or q;3 = 0, and we can indeed compute the task 
with only two speeds, meeting the constraints and with a smaller energy. □ 

We are now ready to assess the problem complexity: 

Theorem 3. The Tri-Crit-Vdd-Chain problem is NP-complete. 

The proof is similar to that of Theorem[T| assuming that there are only two available speeds, and /max- Then 
we reduce the problem from SUBSET-SUM. Note that here again, the problem turns out to be NP-hard even with one 
single processor (linear chain of tasks). 

Proof. Consider the associated decision problem: given an execution graph, m possible speeds, a deadline, a relia- 
bility, and a bound on the energy consumption, can we find the time each task will spend at each speed such that the 
deadline, the reliability and the bound on energy are respected? The problem is clearly in NP: given the time spent 
in each speed for each task, computing the execution time, the reliability and the energy consumption can be done in 
polynomial time. To establish the completeness, we use a reduction from SUBSET-SUM [11]. Let Ii be an instance 
of SUBSET-SUM: given n strictly positive integers ai, . . . , a„, and a positive integer X, does there exist a subset / of 
{l,...,n} such that J^iei = Let S = Yl"=i 

We build the following instance I2 of our problem. The execution graph is a linear chain with n tasks, where: 

• task Ti has weight Wi = ; 

• the processor can run at m = 2 different speeds, /min and /max; 

/max 



• /mill = VAo/maxmax - ^ - ^ 



100 maxi=i. .„ ai ' 

l..n — ~1Q 



* /rel /maxi ^ — 0- 

The bounds on reliability, deadline and energy are: 

• R° = R^ifrel) = 1 - Ao for 1 < i < 7i; 



Eq - 2X/^ji„ + (5 - X)f^^^^. 
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Clearly, the size of X2 is polynomial in the size of X\ . 

Suppose first that instance Xx has a solution, I. For all i £ I, Ti is executed twice at speed /min- Otherwise, for 
all i ^ /, it is executed at speed /max one time only. The execution time is °' -|_ '^f'^' °' — + f"^ = 

The reliability is met for all tasks not in /, since they are executed at speed /rei- It is also met for all tasks in /: 

2 

yi e I, 1 - Xlj^ > 1 - Xo^. The energy consumption is E ^ J^tei 2aJmin + E^^/ aj^^^ ^ 2X/2^i„ + 
{S ~ X)f^^^^ = Eq. All bounds are respected, and therefore the execution speeds are a solution to X2 (and each task 
keeps a constant speed during its whole execution). 

Suppose now that X2 has a solution. Since we consider the Vdd-Hopping model, each execution can be run 
partly at speed /min, and partly at speed /max- However, tasks executed only once are necessarily executed only at 
maximum speed to match the reliability constraint. 

Let / = {z I Ti is executed twice in the solution}. Let Y ~ J2iei ^i- ^^^^ = ^1 + ^2, where Yi is the total 
weight of each execution and re-execution {2Y) of tasks in / that are executed at speed /min^ and Y2 the total weight 
that is executed at speed /max- We show that necessarily Yi = 2X = 2Y , i.e., no part of any task in / is executed at 

speed /max- 

First let us show that 2X < 2Y. The energy consumption of the solution of X2 is E = YL/min + ^2 /max + ("5^ ^ 
^)/max = Yiflin + (S - Yi + Y) f^^^. By differentiating this function (with regards to Yi, E' = f^-^^ - f^^^ < 0), 
we can see that the minimum is reached for Yi = 2Y (since Yi e [0, 2Y]). Then, for Yi = 2Y, since the solution is 
such that E < Eq, we have E - Eq ^ (Y - X)(2/,2 - f^_^^J < 0, and therefore X <Y. 

Next let us show that Yi < 2X. Suppose by contradiction that Yi > 2X, then the execution time of the solution 
of X2is D — -p- — h -p — h = ^ differentiating this function (with regards to Yi), we can see 

it is strictly increasing when Yi goes from 2X to 2Y. However, when Yi — 2X + e, D — Dq = — h ^J^^"^ > 
(indeed, each value of the sum is strictly positive). Hence, Yi < 2X. 

Finally, let us show that Yi = 2X — 2Y. Since X2 is a solution, we know that E < Eq, and therefore 2X — Yi > 
{Y + X - Yi)^ > {Y + X - Yi) (the last equality is only met when Y + X - Yi = 0). Hence 2X > X + Y, 
which is only possible if 2X = X + Y. This gives us the final result: Yi = 2X = 2Y (all inequalities are tight). 

We conclude that J2iei ~ therefore Xi has a solution. This concludes the proof. □ 

In the following, we propose some polynomial time heuristics to tackle the general tri-criteria problem. While 
these heuristics are designed for the CONTINUOUS model, they can be easily adapted to the Vdd-Hopping model 
thanks to Proposition [3] 

5 Heuristics for Tri-Crit-Cont 

In this section, building upon the theoretical results of Section [3] we propose some polynomial time heuristics for the 
Tri-Crit-Cont problem, which was shown NP-hard (see Theorem [TJ. Recall that the mapping of the tasks onto 
the processors is given, and we aim at reducing the energy consumption by exploiting re-execution and speed scaling, 
while meeting the deadline bound and all reliability constraints. 

The first idea is inspired by Proposition [T] first we search for the optimal solution of the problem instance without 
re-execution, a phase that we call deceleration: we slow down some tasks if it can save energy without violating one 
of the constraints. Then we refine the schedule and choose the tasks that we want to re-execute, according to some 
criteria. We call type A heuristics such heuristics that obey this general scheme: first deceleration then re-execution. 
Type A heuristics are expected to be efficient on a DAG with a low degree of parallelism (optimal for a chain). 

However, Proposition |2] (with fork graphs) shows that it might be better to re-execute highly parallel tasks before 
decelerating. Therefore we introduce type B heuristics, which first choose the set of tasks to be re-executed, and 
then try to slow down the tasks that could not be re-executed. We need to find good criteria to select which tasks to 
re-execute, so that type B heuristics prove efficient for DAGs with a high degree of parallelism. In summary, type B 
heuristics obey the opposite scheme: first re-execution then deceleration. 

For both heuristic types, the approach for each phase can be sketched as follows. Initially, each task is executed 
once at speed /max- Then, let di be the finish time of task Ti in the current configuration: 
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- Deceleration: We select a set of tasks that we execute at speed /dec = niax(/rei, '"'^^'"^ " /max), which is the 

slowest possible speed meeting both the reliability and deadline constraints. 

- Re-execution: We greedily select tasks for re-execution. The selection criterion is either by decreasing weights Wi, 

or by decreasing super-weights Wi. The super-weight of a task Ti is defined as the sum of the weights of the 
tasks (including T^) whose execution interval is included into T^'s execution interval. The rationale is that the 
super-weight of a task that we slow down is an estimation of the total amount of work that can be slowed down 
together with that task, hence of the energy potentially saved: this corresponds to the total slack that can be 
reclaimed. 

We introduce further notations before listing the heuristics: 

- SUS (Slack-Usage-Sort) is a function that sorts tasks by decreasing super- weights. 

- ReExec is a function that tries tore-execute the current task T^, at speed /^e-ex = j^/rei, where c = cos i(7r — tan~^ 7?-'^ 

1 (w 0.2838) (note that /re-ex is the optimal speed in the proof of Theorem[T]i. If it succeeds, it also re-executes 
at speed /re-ex all the tasks that are taken into account to compute the super-weight of Ti. Otherwise, it does 
nothing. 

- ReExec&SlowDown performs the same re-executions as ReExec when it succeeds. But if the re-execution of the 

current task Tj is not possible, it slows down Ti as much as possible and does the same for all the tasks that are 
taken into account to compute the super- weight of Ti. 
We now detail the heuristics: 

H/,„ax- In this heuristic, tasks are simply executed once at maximum speed. 

Hno-reex. In this heuristic, we do not allow any re-execution, and we simply consider the possible deceleration 
of the tasks. We set a uniform speed for all tasks, equal to /dec, so that both the reliability and deadline constraints 
are matched. Note that heuristics Hf„,ax and Hno-reex are identical except for a constant ratio on the speeds of each 

*e e„er,, 1 ^ is a,w„. e,.a, .„ (^) V instance, if /,„„ . 1 and . 2/3, 

/dec ^Hno-ree\ ^ /dec J 

then the energy ratio is equal to 2.25). 

A.Greedy. This is a type A heuristic, where we first set the speed of each task to /^ec (deceleration). Let Greedy- 
List be the list of all the tasks sorted according to decreasing weights Wi. Each task Ti in Greedy-List is re-executed at 
speed /re-ex whenever possible. Finally, if there remains some slack at the end of the processing, we slow down both 
executions of each re-executed task as much as possible. 

A. SUS-Crit. This is a type A heuristic, where we first set the speed of each task to fdec. Let List-SW be the list of 
aU tasks that belong to a critical path, sorted according to SUS. We apply ReExec to List-SW (re-execution). Finally 
we reclaim slack for re-executed tasks, similarly to the final step of A.Greedy. 

B. Greedy. This is a type B heuristic. We use Greedy-List as in heuristic A.Greedy. We try to re-execute each 
task Ti of Greedy-List when possible. Then, we slow down both executions of each re-executed task Ti of Greedy-List 
as much as possible. Finally, we slow down the speed of each task of Greedy-List that turn out not re-executed, as 
much as possible. 

B.SUS-Crit. This is a type B heuristic. We use List-SW as in heuristic A.SUS-Crit. We apply ReExec to List-SW 
(re-execution). Then we run Heuristic B.Greedy. 

B.SUS-Crit-Slow. This is a type B heuristic. We use List-SW, and we apply ReExec&SlowDown (re-execution). 
Then we use Greedy-List: for each task Ti of Greedy-List, if there is enough time, we execute twice Tj at speed /re-ex 
(re-execution); otherwise, we execute Ti only once, at the slowest admissible speed. 

Best. This is simply the minimum value over the seven previous heuristics, for reference. 

The complexity of aU these heuristics is bounded by O(n^logn), where n is the number of tasks. The most 
time-consuming operation is the computation of List-SW (the list of all elements belonging to a critical path, sorted 
according to SUS). 

6 Simulations 

In this section, we report extensive simulations to assess the performance of the heuristics presented in Section[5] The 
heuristics were coded in OCaml. The source code is publicly available at ^ (together with additional results that were 
omitted due to lack of space). 
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6.1 Simulation settings 

In order to evaluate the heuristics, we have generated DAGs using the random DAG generation library GGEN f9l. 
Since GGEN does not assign a weight to the tasks of the DAGs, we use a function that gives a random float value in 
the interval [0, 10]. Each simulation uses a DAG with 100 nodes and 300 edges. We observe similar patterns for other 
numbers of edges, see [2] for further information. 

We apply a critical-path list scheduling algorithm to map the DAG onto the p processors: we assign the most urgent 
ready task (with largest bottom-level) to the first available processor. The bottom-level is defined as hl{Ti) = Wi if 
Ti has no successor task, and bl{Ti) — Wi+ max(j'. bl{Tj) otherwise. 

We choose a reliability constant Aq = 10~^ (T] (we obtain identical results with other values, see below). Each 
reported result is the average on ten different DAGs with the same number of nodes and edges, and the energy con- 
sumption is normalized with the energy consumption returned by the Hno-reex heuristic. If the value is lower than 1, 
it means that we have been able to save energy thanks to re-execution. 

We analyze the influence of three different parameters: the tightness of the deadline D, the number of processors 
p, and the reliability speed /rei- In fact, the absolute deadline D is irrelevant, and we rather consider the deadline 
ratio DeadlineRatio — jy—, where -Dmin is the execution time when executing each task once and at maximum 
speed /max (heuristic H/max)- Intuitively, when the deadline ratio is close to 1, there is almost no flexibility and it is 
difficult to re-execute tasks, while when the deadline ratio is larger we expect to be able to slow down and re-execute 
many tasks, thereby saving much more energy. 



6.2 Simulation results 

First note that with a single processor, heuristics A.SUS-Crit and A. Greedy are identical, and heuristics B.SUS-Crit 
and B. Greedy are identical (by definition, the only critical path is the whole set of tasks). 

Deadline ratio. In this set of simulations, we let p e {1, 10, 50, 70} and /^ei = |/max- Figure [T| reports results for 
p = 1 and p — 50. When p = 1, we see that the results are identical for all heuristics of type A, and identical for all 
heuristics of type B. As expected from Proposition[T] type A heuristics are better (see Figure lai. With more processors 



(10, 50, 70), the results have the same general shape: see Figure lb with 50 processors. When DeadlineRatio is 
small, type B heuristics are better. When DeadlineRatio increases up to 1.5, type A heuristics are closer to type 
B ones. Finally, when DeadlineRatio gets larger than 5, all heuristics converge towards the same result, where all 
tasks are re-executed. 

Number of processors. In this set of simulations, we let DeadlineRatio e {1.2, 1.6, 2, 2.4} and /rei = |/max- 
Figure |2] confirms that type A heuristics are particularly efficient when the number of processors is small, whereas 



type B heuristics are at their best when the number of processors is large. Figure 2a confirms the superiority of type B 
heuristics for tight deadlines, as was observed in Figure [Tb] 

Reliability /rei. In this set of simulations, we let p e {1, 10, 50, 70} and DeadlineRatio e {1, 1.5, 3}. InFigurejs] 
there are four different curves: the line at 1 corresponds to Hno-reex and H/max, then come the heuristics of type A 
(that all obtain exactly the same results), then B.SUS-Crit and B.Greedy that also obtain the same results, and finally 
the best heuristic is B.SUS-Crit-Slow. Note that B.SUS-Crit and B.Greedy return the same results because they have 
the same behavior when DeadlineRatio = 1: there is no liberty of action on the critical paths. However B.SUS- 
Crit-Slow gives better results because of the way it decelerates the important tasks that cannot be re-executed. When 
DeadlineRatio is really tight (equal to 1), decreasing the value of /^ei from 1 to 0.9 makes a real difference with 
type B heuristics. We observe an energy gain of 10% when the number of processors is small (10 in FigurelSali and of 



20% with more processors (50 in Figure 3b i 



Reliability constant Ao. In Figure|4] we let Ao vary from 10 ^ to 10 ^, and observe very similar results throughout 
this range of values. Note that we did not plot H/i„ax in this figure to ease the readability. 
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Figure 1: Comparative study when the deadline ratio varies. 
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Figure 2: Comparative study when the number of processors p varies. 
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Figure 3: Comparative study when the reliability /^ei varies. 
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Figure 4: Comparative study when Aq varies. 

6.3 Understanding the results 

A.SUS-Crit and A. Greedy, and B.SUS-Crit and B.Greedy, often obtain similar results, which might lead us to un- 
derestimate the importance of critical path tasks. However, the difference between B.SUS-Crit-Slow and B.SUS-Crit 
shows otherwise. Tasks that belong to a critical path must be dealt with first. 

A striking result is the impact of both the number of processors and the deadline ratio on the effectiveness of the 
heuristics. Heuristics of type A, as suggested by Proposition [T] have much better results when there is a small number 
of processors. When the number of processors increases, there is a difference between small and large deadline ratio. 
In particular, when the deadline ratio is small, heuristics of type B have better results. Indeed, heuristics of type A 
try to accommodate as many tasks as possible, and as a consequence, no task can be re-executed. On the contrary. 
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heuristics of type B try to favor some tasks that are considered as important. This is highly profitable when the deadline 
is tight. 

Note that all these heuristics take in average less than one ms to execute on one instance, which is very reasonable. 
The heuristics that compute the critical path (*.SUS-Crit-*) are the longest, and may take up to two seconds when there 
are few processors. Indeed, the less processors, the more edges there are in the dependence graph once the task graph 
is mapped, and hence it increases the complexity of finding the critical path. However, with more than ten processors, 
the running time never exceeds two ms. 

Altogether we have identified two very efficient and complementary heuristics, A.SUS-Crit and B.SUS-Crit-Slow. 
Taking the best result out of those two heuristics always gives the best result over all simulations. 

7 Conclusion 

In this paper, we have accounted for the energy cost associated to task re-execution in a more realistic and accurate way 
than the best-case model used in ll24l . Coupling this energy model with the classical reliability model used in 11211 . 
we have been able to formulate a tri-criteria optimization problem: how to minimize the energy consumed given a 
deadline bound and a reliability constraint? The "antagonistic" relation between speed and reliability renders this 
tri-criteria problem much more challenging than the standard bi-criteria (makespan, energy) version. We have stated 
two variants of the problem, for processor speeds obeying either the CONTINUOUS or the Vdd-Hopping model. We 
have assessed the intractability of this tri-criteria problem, even in the case of a single processor In addition, we have 
provided several complexity results for particular instances. 

We have designed and evaluated some polynomial-time heuristics for the Tri-Crit-Cont problem that are based 
on the failure probability, the task weights, and the processor speeds. These heuristics aim at minimizing the energy 
consumption while enforcing reliability and deadline constraints. They rely on dynamic voltage and frequency scaling 
(DVFS) to decrease the energy consumption. But because DVFS lowers the reliability of the system, the heuristics 
use re-execution to compensate for the loss. After running several heuristics on a wide class of problem instances, 
we have identified two heuristics that are complementary, and that together are able to produce good results on most 
instances. The good news is that these results bring the first efficient practical solutions to the tri-criteria optimization 
problem, despite its theoretically challenging nature. In addition, while the heuristics do not modify the mapping of 
the application, it is possible to couple them with a list scheduling algorithm, as was done in the simulations, in order 
to solve the more general problem in which the mapping is not akeady given. 

Future work involves several promising directions. On the theoretical side, it would be very interesting to prove a 
competitive ratio for the heuristic that takes the best out of A.SUS-Crit and B.SUS-Crit-Slow. However, this is quite 
a challenging work for arbitrary DAGs, and one may try to design approximation algorithms only for special graph 
structures, e.g., series-parallel graphs. Still, looking back at the complicated case analysis needed for an elementary 
fork-graph with identical weights (Proposition|2]l, we cannot underestimate the difficulty of this problem. 

While we have designed heuristics for the Tri-Crit-Cont model in this paper, we could easily adapt them to 
the Tri-Crit-Vdd model: for a solution given by a heuristic for Tri-Crit-Cont, if a task should be executed at 
the continuous speed /, then we would execute it at the two closest discrete speeds that bound /, while matching 
the execution time and reUabiUty for this task. There remains to quantify the performance loss incurred by the latter 
constraints. 

Finally, we point out that energy reduction and reliability will be even more important objectives with the advent 
of massively parallel platforms, made of a large number of clusters of multi-cores. More efficient solutions to the 
tri-criteria optimization problem (makespan, energy, reliability) could be achieved through combining replication with 
re-execution. A promising (and ambitious) research direction would be to search for the best trade-offs that can 
be achieved between these techniques that both increase reliability, but whose impact on execution time and energy 
consumption is very different. We believe that the comprehensive set of theoretical results and simulations given in 
this paper will provide solid foundations for further studies, and constitute a partial yet important first step for solving 
the problem at very large scale. 

Acknowledgments. A. Benoit and Y. Robert are with the Institut Universitaire de France. This work was supported in 
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